23/03/2025 - 29/03/2025

24/03/2025 17:31

Here is my "simplified" software diagram for the atar daq

graph TD;
    NaluFrontend["<b><a href='https://github.com/PIONEER-Experiment/atar_daq'>Nalu MIDAS Frontend</a></b><br>Coordinates MIDAS event construction"]

    subgraph Libraries
        NaluBoardLib["<b><a href='https://github.com/jaca230/nalu_board_controller'>Nalu Board Controller</a></b><br>C++ Wrapper around naludaq methods for configuring the board and starting readout"]
        NaluEventCollectorLib["<b><a href='https://github.com/jaca230/nalu_event_collector'>Nalu Event Collector</a></b><br>C++ API for launching collector threads. Handles receiving data over UDP, processing packets, and collecting into NaluEvents"]
        MidasLib["<b><a href='https://bitbucket.org/tmidas/midas/src/develop/'>MIDAS</a></b><br>Data acquisition framework"]
        ReflectCppLib["<b><a href='https://github.com/getml/reflect-cpp'>reflect-cpp</a></b><br>C++ reflection library used for serialization"]
    end

    subgraph Python Packages
        NaludaqPython["<b><a href='https://pypi.org/project/naludaq/0.31.9/'>naludaq</a></b><br>Python interface for Naludaq"]
    end

    subgraph Classes
        NaluBoardController["<b><a href='https://github.com/jaca230/nalu_board_controller/blob/main/include/nalu_board_controller.h'>nalu_board_controller</a></b><br>Provides methods for configuring the board and starting readout"]
        NaluEventCollector["<b><a href='https://github.com/jaca230/nalu_event_collector/blob/main/include/nalu_event_collector.h'>nalu_event_collector</a></b><br>Provides methods for starting collector threads and polling for events"]
        OdbManager["<b><a href='https://github.com/PIONEER-Experiment/atar_daq/blob/main/include/odb_manager.h'>odb_manager</a></b><br>Handles initializing and managing ODB structure for Nalu Equipment"]
        MidasFrontend["<b><a href='https://bitbucket.org/tmidas/midas/src/develop/include/mfe.h'>mfe</a></b><br>Handles MIDAS frontend logic"]
    end

    %% Connect libraries to the PythonPackages layer
    NaluBoardLib -->|Pybind| NaludaqPython

    %% Connect libraries to the Classes layer
    NaluFrontend -->|Uses| NaluBoardLib
    NaluFrontend -->|Uses| NaluEventCollectorLib
    NaluFrontend -->|Uses| MidasLib
    NaluFrontend -->|Uses| ReflectCppLib
    NaluBoardLib -->|Provides| NaluBoardController
    NaluEventCollectorLib -->|Provides| NaluEventCollector
    MidasLib -->|Provides| MidasFrontend
    ReflectCppLib --> |Used By| OdbManager
graph TD;
    NaluFrontend["<b><a href='https://github.com/PIONEER-Experiment/atar_daq'>Nalu MIDAS Frontend</a></b><br>Coordinates MIDAS event construction"]

    subgraph Libraries
        NaluBoardLib["<b><a href='https://github.com/jaca230/nalu_board_controller'>Nalu Board Controller</a></b><br>C++ Wrapper around naludaq methods for configuring the board and starting readout"]
        NaluEventCollectorLib["<b><a href='https://github.com/jaca230/nalu_event_collector'>Nalu Event Collector</a></b><br>C++ API for launching collector threads. Handles receiving data over UDP, processing packets, and collecting into NaluEvents"]
        MidasLib["<b><a href='https://bitbucket.org/tmidas/midas/src/develop/'>MIDAS</a></b><br>Data acquisition framework"]
        ReflectCppLib["<b><a href='https://github.com/getml/reflect-cpp'>reflect-cpp</a></b><br>C++ reflection library used for serialization"]
    end

    subgraph Python Packages
        NaludaqPython["<b><a href='https://pypi.org/project/naludaq/0.31.9/'>naludaq</a></b><br>Python interface for Naludaq"]
    end

    subgraph Classes
        NaluBoardController["<b><a href='https://github.com/jaca230/nalu_board_controller/blob/main/include/nalu_board_controller.h'>nalu_board_controller</a></b><br>Provides methods for configuring the board and starting readout"]
        NaluEventCollector["<b><a href='https://github.com/jaca230/nalu_event_collector/blob/main/include/nalu_event_collector.h'>nalu_event_collector</a></b><br>Provides methods for starting collector threads and polling for events"]
        OdbManager["<b><a href='https://github.com/PIONEER-Experiment/atar_daq/blob/main/include/odb_manager.h'>odb_manager</a></b><br>Handles initializing and managing ODB structure for Nalu Equipment"]
        MidasFrontend["<b><a href='https://bitbucket.org/tmidas/midas/src/develop/include/mfe.h'>mfe</a></b><br>Handles MIDAS frontend logic"]
    end

    %% Connect libraries to the PythonPackages layer
    NaluBoardLib -->|Pybind| NaludaqPython

    %% Connect libraries to the Classes layer
    NaluFrontend -->|Uses| NaluBoardLib
    NaluFrontend -->|Uses| NaluEventCollectorLib
    NaluFrontend -->|Uses| MidasLib
    NaluFrontend -->|Uses| ReflectCppLib
    NaluBoardLib -->|Provides| NaluBoardController
    NaluEventCollectorLib -->|Provides| NaluEventCollector
    MidasLib -->|Provides| MidasFrontend
    ReflectCppLib --> |Used By| OdbManager

9b5a56168930ce8e3f01ec37cd955568.png


25/03/2025 13:35

I identified "problematic" rate test parameters with this script

# Step 1: First filter based on Expected Data Rate
df_filtered_initial = df[df['Expected Data Rate (KB/s)'] < 55000].copy()

# Step 2: Define filtering conditions on this subset
condition_1 = df_filtered_initial['Collector Error'].notna() & (df_filtered_initial['Collector Error'] != "None")
condition_2 = ~df_filtered_initial['kBytes per sec'].div(df_filtered_initial['Expected Data Rate (KB/s)']).between(0.8, 1.4)
condition_3 = ~df_filtered_initial['Frequency (Hz)'].div(df_filtered_initial['Data Rate (Events per sec)']).between(0.9, 1.1)
condition_4 = df_filtered_initial['Frequency (Hz)'] > 1000  # Frequency must be above 1 kHz

# Step 3: Create a reason column to track which conditions were met
df_filtered_initial['Reason'] = ''

df_filtered_initial.loc[condition_1, 'Reason'] += 'Collector Error; '
df_filtered_initial.loc[condition_2, 'Reason'] += 'Data Rate Mismatch; '
df_filtered_initial.loc[condition_3, 'Reason'] += 'Frequency/Data Rate Mismatch; '

# Step 4: Apply the additional filtering conditions
filtered_df = df_filtered_initial[(condition_1 | condition_2 | condition_3) & condition_4].copy()

# Display row count and first few rows for verification
print(f"Filtered DataFrame has {filtered_df.shape[0]} rows.")
filtered_df[['File', 'Frequency (Hz)', 'Data Rate (Events per sec)', 
             'Windows', 'Events Sent', 'kBytes per sec', 
             'Active Channels Length', 'Expected Data Rate (KB/s)', 
             'Collector Error', 'Reason']]


# Define the output file path
output_file = "filtered_data.txt"

# Open the file and write each row in the specified format
with open(output_file, "w") as f:
    for _, row in filtered_df.iterrows():
        frequency = int(row['Frequency (Hz)'])
        windows = int(row['Windows'])
        channels = int(row['Active Channels Length'])
        computed_value = frequency * windows * channels
        
        # Format the line as: 0 0 0 {frequency} {windows} {channels} {computed_value}
        f.write(f"0 0 0 {frequency} {windows} {channels} {computed_value}\n")

print(f"Filtered data has been written to {output_file}")
# Step 1: First filter based on Expected Data Rate
df_filtered_initial = df[df['Expected Data Rate (KB/s)'] < 55000].copy()

# Step 2: Define filtering conditions on this subset
condition_1 = df_filtered_initial['Collector Error'].notna() & (df_filtered_initial['Collector Error'] != "None")
condition_2 = ~df_filtered_initial['kBytes per sec'].div(df_filtered_initial['Expected Data Rate (KB/s)']).between(0.8, 1.4)
condition_3 = ~df_filtered_initial['Frequency (Hz)'].div(df_filtered_initial['Data Rate (Events per sec)']).between(0.9, 1.1)
condition_4 = df_filtered_initial['Frequency (Hz)'] > 1000  # Frequency must be above 1 kHz

# Step 3: Create a reason column to track which conditions were met
df_filtered_initial['Reason'] = ''

df_filtered_initial.loc[condition_1, 'Reason'] += 'Collector Error; '
df_filtered_initial.loc[condition_2, 'Reason'] += 'Data Rate Mismatch; '
df_filtered_initial.loc[condition_3, 'Reason'] += 'Frequency/Data Rate Mismatch; '

# Step 4: Apply the additional filtering conditions
filtered_df = df_filtered_initial[(condition_1 | condition_2 | condition_3) & condition_4].copy()

# Display row count and first few rows for verification
print(f"Filtered DataFrame has {filtered_df.shape[0]} rows.")
filtered_df[['File', 'Frequency (Hz)', 'Data Rate (Events per sec)', 
             'Windows', 'Events Sent', 'kBytes per sec', 
             'Active Channels Length', 'Expected Data Rate (KB/s)', 
             'Collector Error', 'Reason']]


# Define the output file path
output_file = "filtered_data.txt"

# Open the file and write each row in the specified format
with open(output_file, "w") as f:
    for _, row in filtered_df.iterrows():
        frequency = int(row['Frequency (Hz)'])
        windows = int(row['Windows'])
        channels = int(row['Active Channels Length'])
        computed_value = frequency * windows * channels
        
        # Format the line as: 0 0 0 {frequency} {windows} {channels} {computed_value}
        f.write(f"0 0 0 {frequency} {windows} {channels} {computed_value}\n")

print(f"Filtered data has been written to {output_file}")

So my criteria are:

  1. The expected data rate must be below 55 MB/s.

    1. This is the limit the board can output, so we don't expect good performance above this.
  2. Must over over 1kHz trigger rate

    1. I only do this because the "expected data rate" calculation is poor for low event rates. If I don't make this cut, I get a lot of data points that weren't problematic
  3. The "normalized" data rate is outside the range [0.8,1.4].

    1. So we differ from the expected data rate by a meaningful percentage
  4. The "normalized" event rate is outside the range [0.9,1.1]

    1. So we differ from the expected event rate (the external trigger rate) by a meaningul percentage
  5. The run contained an error


25/03/2025 13:44

Using the above criteria, I created a parmeter space for the sequencer to go through. For each value in the parameter space I did a 1 minute long run. I took a sample every 4 seconds of midas' measured data rate and event rate. So each "problematic parameter set" had 15 sequential samples.

25/03/2025 13:43

After collecting this data I average it and computed means and uncertainties like so:

# Function to compute Time to Stability
def time_to_stability(data_rates, tolerance=0.01, window=3):
    """Returns the index where data rate stabilizes within tolerance of final value for a given run."""
    final_value = data_rates.iloc[-1]  # Assume last value is steady-state
    threshold = final_value * (1 - tolerance)  # Define stability threshold
    
    for i in range(len(data_rates) - window):
        if np.all(data_rates.iloc[i:i+window] >= threshold):
            return i  # First index where stability is reached
    
    return len(data_rates)  # If never stabilizes, return full lengths

def count_collector_errors(errors):
    """Counts the number of non-'None' and non-'N/A' collector errors."""
    return errors[~errors.isin(['None', 'N/A'])].count()


# Define the aggregation functions for each column
agg_funcs = {
    'Frequency (Hz)': 'mean',  # No uncertainty needed
    'Data Rate (Events per sec)': [
        'mean',  # Mean
        lambda x: np.std(x) / np.sqrt(len(x)),  # Uncertainty (standard error)
        time_to_stability  # Compute stability time
    ],
    'Windows': 'mean',  # No uncertainty needed
    'Events Sent': 'max',  # Take the maximum value
    'kBytes per sec': [
        'mean',  # Mean
        lambda x: np.std(x) / np.sqrt(len(x))  # Uncertainty (standard error)
    ],
    'Active Channels Length': 'mean',  # No uncertainty needed
    'Expected Data Rate (KB/s)': 'mean',  # No uncertainty needed
    'Collector Error': count_collector_errors  # Count occurrences of actual errors
}

# Perform the groupby aggregation
consolidated_df = df.groupby('Run Number').agg(agg_funcs)

# Rename the columns for clarity
consolidated_df.columns = [
    'Avg Frequency (Hz)',
    'Avg Data Rate (Events per sec)', 'Uncertainty Data Rate', 'Time to Stability',
    'Avg Windows',
    'Max Events Sent',
    'Avg kBytes per sec', 'Uncertainty kBytes per sec',
    'Avg Active Channels Length',
    'Avg Expected Data Rate (KB/s)',
    'Collector Error Count'
]

# Compute the normalized values and their uncertainties
consolidated_df['Normalized Frequency'] = consolidated_df['Avg Data Rate (Events per sec)'] / consolidated_df['Avg Frequency (Hz)']
consolidated_df['Uncertainty Normalized Frequency'] = consolidated_df['Uncertainty Data Rate'] / consolidated_df['Avg Frequency (Hz)']

consolidated_df['Normalized kBytes per sec to Expected Data Rate'] = consolidated_df['Avg kBytes per sec'] / consolidated_df['Avg Expected Data Rate (KB/s)']
consolidated_df['Uncertainty Normalized kBytes per sec to Expected Data Rate'] = consolidated_df['Uncertainty kBytes per sec'] / consolidated_df['Avg Expected Data Rate (KB/s)']

# Reset index for better readability
consolidated_df.reset_index(inplace=True)

# Display the consolidated DataFrame
consolidated_df
# Function to compute Time to Stability
def time_to_stability(data_rates, tolerance=0.01, window=3):
    """Returns the index where data rate stabilizes within tolerance of final value for a given run."""
    final_value = data_rates.iloc[-1]  # Assume last value is steady-state
    threshold = final_value * (1 - tolerance)  # Define stability threshold
    
    for i in range(len(data_rates) - window):
        if np.all(data_rates.iloc[i:i+window] >= threshold):
            return i  # First index where stability is reached
    
    return len(data_rates)  # If never stabilizes, return full lengths

def count_collector_errors(errors):
    """Counts the number of non-'None' and non-'N/A' collector errors."""
    return errors[~errors.isin(['None', 'N/A'])].count()


# Define the aggregation functions for each column
agg_funcs = {
    'Frequency (Hz)': 'mean',  # No uncertainty needed
    'Data Rate (Events per sec)': [
        'mean',  # Mean
        lambda x: np.std(x) / np.sqrt(len(x)),  # Uncertainty (standard error)
        time_to_stability  # Compute stability time
    ],
    'Windows': 'mean',  # No uncertainty needed
    'Events Sent': 'max',  # Take the maximum value
    'kBytes per sec': [
        'mean',  # Mean
        lambda x: np.std(x) / np.sqrt(len(x))  # Uncertainty (standard error)
    ],
    'Active Channels Length': 'mean',  # No uncertainty needed
    'Expected Data Rate (KB/s)': 'mean',  # No uncertainty needed
    'Collector Error': count_collector_errors  # Count occurrences of actual errors
}

# Perform the groupby aggregation
consolidated_df = df.groupby('Run Number').agg(agg_funcs)

# Rename the columns for clarity
consolidated_df.columns = [
    'Avg Frequency (Hz)',
    'Avg Data Rate (Events per sec)', 'Uncertainty Data Rate', 'Time to Stability',
    'Avg Windows',
    'Max Events Sent',
    'Avg kBytes per sec', 'Uncertainty kBytes per sec',
    'Avg Active Channels Length',
    'Avg Expected Data Rate (KB/s)',
    'Collector Error Count'
]

# Compute the normalized values and their uncertainties
consolidated_df['Normalized Frequency'] = consolidated_df['Avg Data Rate (Events per sec)'] / consolidated_df['Avg Frequency (Hz)']
consolidated_df['Uncertainty Normalized Frequency'] = consolidated_df['Uncertainty Data Rate'] / consolidated_df['Avg Frequency (Hz)']

consolidated_df['Normalized kBytes per sec to Expected Data Rate'] = consolidated_df['Avg kBytes per sec'] / consolidated_df['Avg Expected Data Rate (KB/s)']
consolidated_df['Uncertainty Normalized kBytes per sec to Expected Data Rate'] = consolidated_df['Uncertainty kBytes per sec'] / consolidated_df['Avg Expected Data Rate (KB/s)']

# Reset index for better readability
consolidated_df.reset_index(inplace=True)

# Display the consolidated DataFrame
consolidated_df

I also computed 2 "new" metrics:

  1. Number of collectors errors
    1. This is just how many of these samples had a collector error
    2. If sample X in a run has a collector error, all subsequent samples in that run will report having a collector error (i.e. errors are only cleared at the start of a new run).
    3. This means this gives a metric for how long a run lasted before seeing a collector error
  2. Time to Stability
    1. This is how many samples in a run it took before the data rate stabilized.
    2. Mathematically if we take NNN samples per run, this is the index iii such that \forall i \leq j \leq NijN\forall i \leq j \leq N \text{DataRate}[j] \geq 0.99\cdot \text{DataRate}[N]DataRate[j]0.99DataRate[N]\text{DataRate}[j] \geq 0.99\cdot \text{DataRate}[N]
      1. Really I should also require \text{DataRate}[j] \leq 1.01\cdot \text{DataRate}[N]DataRate[j]1.01DataRate[N]\text{DataRate}[j] \leq 1.01\cdot \text{DataRate}[N] but I didn't and I don't image it has much effect on the metric result shown below.
      2. I retroactively added \text{DataRate}[j] \leq 1.01\cdot \text{DataRate}[N]DataRate[j]1.01DataRate[N]\text{DataRate}[j] \leq 1.01\cdot \text{DataRate}[N]. See plots below.

25/03/2025 13:58

Here are the plots from the last round of analyzation just for comparison purposes. They don't give much insight into what's causing the lower data rates:

f0e4404bacfa0902695d028eb2b35a40.png
ffdd8fec63d718741090f10e128fd799.png
c5b444b148e5de3c9d6b4bf00498c9a2.png
2cd561bc36625fba029cc4c7e308b88f.png
6fa9c7123eee3dba4fb41ef8c3a64072.png
51fb6aa1af703643d12efb3a1f10d686.png
4bde8b14064aacc98a00cbfce801f42f.png
Note: The reson some of the data points are out of the range 55 MB/s expected data rate range despite the fact I made a cut to exclude those earlier is as folllows:
When I made the cut, I used data rate calculation:
\text{Data Rate (B/s)} \approx \text{Trigger rate}\cdot(\text{N}_\text{channels}\cdot\text{N}_\text{windows}\cdot(\text{Packet Length = 80 bytes}) + (\text{Event Header+Footer = 28 bytes}))Data Rate (B/s)Trigger rate(NchannelsNwindows(Packet Length = 80 bytes)+(Event Header+Footer = 28 bytes))\text{Data Rate (B/s)} \approx \text{Trigger rate}\cdot(\text{N}_\text{channels}\cdot\text{N}_\text{windows}\cdot(\text{Packet Length = 80 bytes}) + (\text{Event Header+Footer = 28 bytes}))
However, this is a mistake, it should be:
\text{Data Rate (B/s)} \approx \text{Trigger rate}\cdot(\text{N}_\text{channels}\cdot\text{N}_\text{windows}\cdot(\text{Packet Length = 80 bytes}) + (\text{Event Header+Footer = 34 bytes}) + (\text{Timing Data = 64 bytes}))Data Rate (B/s)Trigger rate(NchannelsNwindows(Packet Length = 80 bytes)+(Event Header+Footer = 34 bytes)+(Timing Data = 64 bytes))\text{Data Rate (B/s)} \approx \text{Trigger rate}\cdot(\text{N}_\text{channels}\cdot\text{N}_\text{windows}\cdot(\text{Packet Length = 80 bytes}) + (\text{Event Header+Footer = 34 bytes}) + (\text{Timing Data = 64 bytes}))
So our cut was off by a little bit

7311e22e5da4c9c57955a95befb5f11e.png


25/03/2025 14:05

Here are some "new" plots that are a bit more telling:

Collector error analysis:
c705389407557ee9cc63db62f3494782.png

We can see how collector errors are dependenat on our parameters. For some reason, they were very prevalent in 2 channels. But the thing most correlated with collector errors is the expected data rate.

Time to stability analysis

Just \text{DataRate}[j] \geq 0.99\cdot \text{DataRate}[N]DataRate[j]0.99DataRate[N]\text{DataRate}[j] \geq 0.99\cdot \text{DataRate}[N]:
bef431ffaa8ee126560a53dd0ddf3ec5.png

\text{DataRate}[j] \geq 0.99\cdot \text{DataRate}[N]DataRate[j]0.99DataRate[N]\text{DataRate}[j] \geq 0.99\cdot \text{DataRate}[N] and \text{DataRate}[j] \leq 1.01\cdot \text{DataRate}[N]DataRate[j]1.01DataRate[N]\text{DataRate}[j] \leq 1.01\cdot \text{DataRate}[N]:
bbf78552e1d627551d2a380911f31ee3.png

We can see how time for the data rate to stabilize depends on our parameters. Again, it's most correlated with the expected data rate.

Average data rate with uncertainty vs parameters
0aa5eb949fc0b49377b9432ecba3ae8c.png

We have see how the data rate depends on our parameters. We see artifacts of "skipping" event in there.

Normalied average data rate with uncertainty vs parameters
f0c3146558e5d9ebff55454fbf4f1993.png

Same plot as above, just normalized based on \frac{\text{average data rate}}{\text{expected data rate}}average data rateexpected data rate\frac{\text{average data rate}}{\text{expected data rate}} This gives insight into which parameters are problematic. However, this "expected data rate" calculation has issues because I don't know how to correctly account for all the data going into midas. I.e. the logger logs some additional data (such as bank names, index, etc.) that skews the actual result upwards. as a result, I believe a lot of these events are actual "normal". They were just picked out by my cuts above due to this lack of understanding.

Normalized average event rate with uncertainty vs parameters
0fba32ad943cb3899172c8f4d280ec99.png

This is similar to the plot above, except we're plotting normalized event rate i.e. \frac{\text{average event rate}}{\text{input trigger frequency}}average event rateinput trigger frequency\frac{\text{average event rate}}{\text{input trigger frequency}}. I believe this plot is slightly more telling that the one above because we really expect every one of these data points to be on the red dotted line, otherwise we're missing events. Overall, it's unclear what's causing events to be missed. It's most correlated with the expected data rate.


28/03/2025 16:22

I did a longer run where I don't just look at the failure modes to get more data for the "working modes". What I find is there are actually more errrors than expected. I.e. the 4 second tests did not reveal as many errors as the 60 second tests. I suspect this may have to do with the collectors "time_threshold" parameter choice. If it's not set properly, the collect can fill up. Below are some plots (very similar to above).

8114bc3799011f4c559c659e153686ed.png
5a9b9c413a5d8c1304445b99fff8ac59.png
c37816919a5bab2b186c88b3accead02.png
19408f25e8c83acf31b87e885a00e180.png
9d3026fc9774598ce393adc7d022f3e2.png
6547094f660ac271f51b6dd0e9841306.png
cf4f912b04810035362852cc6b8f8cc7.png
fa1f54b4eff24e624310a57cc4a41c1f.png
26b539cc81176e41a43072add524af9a.png
93d9af08f3694c4333fe378502e225b6.png
1aead2f8abd592c2a6934444b515df88.png
9ed330a1c4fe65c442b829902e1d8855.png
79d34970d01b0d65a40c0824ad05934b.png